Exploiting Named Entity Mentions Towards Code Mixed IR : Working Notes for the UB system submission for MSIR@FIRE-2016

نویسندگان

Nikhil Londhe

Rohini K. Srihari

چکیده

A sizable percentage of online user generated content is susceptible to code switching and code mixing owing to a variety of reasons. Thus, an expected consequence is that adhoc user queries on such data are also inherently code mixed. This paper thus presents our solution for a similar scenario : information retrieval on code mixed Hindi-English tweets. We explore techniques in information extraction, clustering and query expansion as part of this work and present our results on the test dataset. Our system achieved a MAP of 0.0217 on the test set and placed third on the rankings. CCS Concepts •Information systems→Multilingual and cross-lingual retrieval; •Computing methodologies→Natural language processing;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Code Mixed Entity Extraction in Indian Languages using Neural Networks

In this paper we present our submission for FIRE 2016 Shared Task on Code Mixed Entity Extraction in Indian Languages. We describe a Neural Network system for Entity Extraction in Hindi-English Code Mixed text. Our method uses distributed word representations as features for the Neural Network and therefore, can easily be replicated across languages. Our system ranked first place for Hindi-Engl...

متن کامل

NLP-NITMZ @ MSIR 2016 System for Code-Mixed Cross-Script Question Classification

This paper describes our approach on Code–Mixed Cross– Script Question Classification task, which is a subtask 1 of MSIR 2016. MSIR is a Mixed Script Information Retrieval event in conjunction with FIRE 2016, which is the 8th meeting of Forum for Information Retrieval Evaluation. For this task, our team NLP–NITMZ submitted three system runs such as: i) using a direct feature set; ii) using dire...

متن کامل

ISM@FIRE-2015: Mixed Script Information Retrieval

This paper describes the approach we have used for identification of languages for a set of terms written in Roman script and approaches for the retrieval in mixed script domain, in FIRE-2015. The first approach identifies the class (native language of terms and whether a term is any named entity or of any other type) of given terms/words. MaxEnt a supervised classifier has been used for the cl...

متن کامل

Amrita-CEN@MSIR-FIRE2016: Code-Mixed Question Classification using BoWs and RNN Embeddings

Question classification is a key task in many question answering applications. Nearly all previous work on question classification has used machine learning and knowledge-based methods. This working note presents an embedding based Bag-ofWords method and Recurrent Neural Network to achieve an automatic question classification in the code-mixed BengaliEnglish text. We build two systems that clas...

متن کامل

CEN@Amrita FIRE 2016: Context based Character Embeddings for Entity Extraction in Code-Mixed Text

This paper presents the working methodology and results on Code Mix Entity Extraction in Indian Languages (CMEE-IL) shared the task of FIRE-2016. The aim of the task is to identify various entities such as a person, organization, movie and location names in a given code-mixed tweets. The tweets in code mix are written in English mixed with Hindi or Tamil. In this work, Entity Extraction system ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Exploiting Named Entity Mentions Towards Code Mixed IR : Working Notes for the UB system submission for MSIR@FIRE-2016

نویسندگان

چکیده

منابع مشابه

Code Mixed Entity Extraction in Indian Languages using Neural Networks

NLP-NITMZ @ MSIR 2016 System for Code-Mixed Cross-Script Question Classification

ISM@FIRE-2015: Mixed Script Information Retrieval

Amrita-CEN@MSIR-FIRE2016: Code-Mixed Question Classification using BoWs and RNN Embeddings

CEN@Amrita FIRE 2016: Context based Character Embeddings for Entity Extraction in Code-Mixed Text

عنوان ژورنال:

اشتراک گذاری